Pesquisa | Portal Regional da BVS

Discriminative machine learning for maximal representative subsampling.

Hauptmann, Tony; Fellenz, Sophie; Nathan, Laksan; Tüscher, Oliver; Kramer, Stefan.

Sci Rep ; 13(1): 20925, 2023 Nov 27.

Artigo em Inglês | MEDLINE | ID: mdl-38017053

RESUMO

Biased population samples pose a prevalent problem in the social sciences. Therefore, we present two novel methods that are based on positive-unlabeled learning to mitigate bias. Both methods leverage auxiliary information from a representative data set and train machine learning classifiers to determine the sample weights. The first method, named maximum representative subsampling (MRS), uses a classifier to iteratively remove instances, by assigning a sample weight of 0, from the biased data set until it aligns with the representative one. The second method is a variant of MRS - Soft-MRS - that iteratively adapts sample weights instead of removing samples completely. To assess the effectiveness of our approach, we induced artificial bias in a public census data set and examined the corrected estimates. We compare the performance of our methods against existing techniques, evaluating the ability of sample weights created with Soft-MRS or MRS to minimize differences and improve downstream classification tasks. Lastly, we demonstrate the applicability of the proposed methods in a real-world study of resilience research, exploring the influence of resilience on voting behavior. Through our work, we address the issue of bias in social science, amongst others, and provide a versatile methodology for bias reduction based on machine learning. Based on our experiments, we recommend to use MRS for downstream classification tasks and Soft-MRS for downstream tasks where the relative bias of the dependent variable is relevant.

A fair experimental comparison of neural network architectures for latent representations of multi-omics for drug response prediction.

Hauptmann, Tony; Kramer, Stefan.

BMC Bioinformatics ; 24(1): 45, 2023 Feb 14.

Artigo em Inglês | MEDLINE | ID: mdl-36788531

RESUMO

BACKGROUND: Recent years have seen a surge of novel neural network architectures for the integration of multi-omics data for prediction. Most of the architectures include either encoders alone or encoders and decoders, i.e., autoencoders of various sorts, to transform multi-omics data into latent representations. One important parameter is the depth of integration: the point at which the latent representations are computed or merged, which can be either early, intermediate, or late. The literature on integration methods is growing steadily, however, close to nothing is known about the relative performance of these methods under fair experimental conditions and under consideration of different use cases. RESULTS: We developed a comparison framework that trains and optimizes multi-omics integration methods under equal conditions. We incorporated early integration, PCA and four recently published deep learning methods: MOLI, Super.FELT, OmiEmbed, and MOMA. Further, we devised a novel method, Omics Stacking, that combines the advantages of intermediate and late integration. Experiments were conducted on a public drug response data set with multiple omics data (somatic point mutations, somatic copy number profiles and gene expression profiles) that was obtained from cell lines, patient-derived xenografts, and patient samples. Our experiments confirmed that early integration has the lowest predictive performance. Overall, architectures that integrate triplet loss achieved the best results. Statistical differences can, overall, rarely be observed, however, in terms of the average ranks of methods, Super.FELT is consistently performing best in a cross-validation setting and Omics Stacking best in an external test set setting. CONCLUSIONS: We recommend researchers to follow fair comparison protocols, as suggested in the paper. When faced with a new data set, Super.FELT is a good option in the cross-validation setting as well as Omics Stacking in the external test set setting. Statistical significances are hardly observable, despite trends in the algorithms' rankings. Future work on refined methods for transfer learning tailored for this domain may improve the situation for external test sets. The source code of all experiments is available under https://github.com/kramerlab/Multi-Omics_analysis.

Assuntos

Multiômica , Redes Neurais de Computação , Humanos , Algoritmos , Transcriptoma , Software

Four-dimensional trapped ion mobility spectrometry lipidomics for high throughput clinical profiling of human blood samples.

Lerner, Raissa; Baker, Dhanwin; Schwitter, Claudia; Neuhaus, Sarah; Hauptmann, Tony; Post, Julia M; Kramer, Stefan; Bindila, Laura.

Nat Commun ; 14(1): 937, 2023 02 20.

Artigo em Inglês | MEDLINE | ID: mdl-36806650

RESUMO

Lipidomics encompassing automated lipid extraction, a four-dimensional (4D) feature selection strategy for confident lipid annotation as well as reproducible and cross-validated quantification can expedite clinical profiling. Here, we determine 4D descriptors (mass to charge, retention time, collision cross section, and fragmentation spectra) of 200 lipid standards and 493 lipids from reference plasma via trapped ion mobility mass spectrometry to enable the implementation of stringent criteria for lipid annotation. We use 4D lipidomics to confidently annotate 370 lipids in reference plasma samples and 364 lipids in serum samples, and reproducibly quantify 359 lipids using level-3 internal standards. We show the utility of our 4D lipidomics workflow for high-throughput applications by reliable profiling of intra-individual lipidome phenotypes in plasma, serum, whole blood, venous and finger-prick dried blood spots.

Assuntos

Lipidômica , Lipídeos , Humanos , Lipídeos/química , Lipidômica/métodos , Espectrometria de Mobilidade Iônica , Fluxo de Trabalho

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA